Power Function Distribution (powerlaw)#
The power function distribution (called powerlaw in SciPy) is a simple one-parameter family of continuous distributions on \([0,1]\) with CDF
Equivalently,
It’s useful whenever you need a flexible model for a bounded proportion that can concentrate near 0 or near 1. It also appears naturally in order statistics: the maximum of \(n\) i.i.d. Uniform\((0,1)\) variables has powerlaw distribution with \(a=n\).
What you’ll learn#
classification, support, and parameterization (including SciPy’s
loc/scale)PDF/CDF/PPF and connections to Beta and order statistics
moments (mean/variance/skewness/kurtosis), MGF/CF, and entropy
how the shape parameter \(a\) changes the distribution
derivations: expectation, variance, likelihood, and the MLE for \(a\)
NumPy-only sampling via inverse CDF + Monte Carlo validation
practical usage via
scipy.stats.powerlaw(pdf,cdf,rvs,fit)hypothesis testing, Bayesian modeling, and generative modeling patterns
Notebook roadmap#
Title & classification
Intuition & motivation
Formal definition (PDF/CDF)
Moments & properties
Parameter interpretation
Derivations ((\mathbb{E}[X]), (\mathrm{Var}(X)), likelihood)
Sampling & simulation (NumPy-only)
Visualization (PDF, CDF, Monte Carlo)
SciPy integration (
scipy.stats.powerlaw)Statistical use cases
Pitfalls
Summary
import math
import numpy as np
import scipy
from scipy import special, stats
import plotly
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
# Plotly rendering (CKC convention)
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
# Reproducibility
SEED = 7
rng = np.random.default_rng(SEED)
np.set_printoptions(precision=4, suppress=True)
print("numpy ", np.__version__)
print("scipy ", scipy.__version__)
print("plotly", plotly.__version__)
numpy 1.26.2
scipy 1.15.0
plotly 6.5.2
Prerequisites & notation#
Prerequisites
comfort with basic calculus (single-variable integration)
basic probability (PDF/CDF, expectation, variance)
basic statistics (likelihood, MLE)
Important terminology
In SciPy,
powerlawis the power function distribution on a bounded interval.It is not the heavy-tailed “power-law” distribution often used for wealth/city sizes (see
scipy.stats.pareto,scipy.stats.zipf, etc.).
SciPy parameterization SciPy uses a location-scale family:
scipy.stats.powerlaw(a, loc=ℓ, scale=s)has support (x \in [\ell,, \ell+s]) with (s>0).The canonical form in this notebook uses (\ell=0), (s=1).
1) Title & Classification#
Name:
powerlaw(power function distribution)Type: Continuous
Support (canonical): (x \in [0,1]) (density is defined for (0<x<1))
Parameter space (canonical): shape (a>0)
With SciPy’s location-scale parameters:
Support: (x \in [\ell, \ell+s])
Parameters: (a>0), (\ell \in \mathbb{R}), (s>0)
We write:
2) Intuition & Motivation#
2.1 What it models#
powerlaw models a bounded random variable on ([0,1]) whose probability mass can be “tilted” toward 0 or toward 1 with a single shape parameter (a).
A useful way to think about it is via the transformation:
If (U \sim \mathrm{Uniform}(0,1)), then $\(X = U^{1/a} \sim \texttt{powerlaw}(a).\)$
So (a) controls how aggressively the uniform is “stretched” toward 0 ((a<1)) or toward 1 ((a>1)).
2.2 Real-world use cases#
Proportions that lean toward 0 or 1: completion rates, saturation fractions, normalized scores
Order statistics: maxima of uniforms (e.g., “best-of-(n)” effects)
P-values under alternatives (simple model): p-values often concentrate near 0; (\mathrm{Beta}(a,1)) with (a<1) is a common approximation
Simple generative noise on ([0,1]): random thresholds, random quantiles
2.3 Relations to other distributions#
Beta:
powerlaw(a)is exactly (\mathrm{Beta}(a,1)).Uniform: (a=1) gives (f(x)=1) on ([0,1]).
Exponential (via log transform): if (X \sim \texttt{powerlaw}(a)), then $\(-\log X \sim \mathrm{Exponential}(\text{rate}=a).\)$
Order statistic: if (U_1,\dots,U_n) are i.i.d. Uniform((0,1)), then (\max_i U_i \sim \texttt{powerlaw}(n)).
3) Formal Definition#
3.1 Canonical PDF#
For (a>0) and (0<x<1):
3.2 Canonical CDF#
The CDF is
3.3 Quantile function (inverse CDF)#
For (u\in(0,1)):
This yields a simple inverse-transform sampler.
3.4 Location-scale form (SciPy)#
If (X\sim\texttt{powerlaw}(a)) on ([0,1]) and we define (Y = \ell + sX) with (s>0), then (Y) has support ([\ell,\ell+s]) and
def powerlaw_pdf(x: np.ndarray, a: float, loc: float = 0.0, scale: float = 1.0) -> np.ndarray:
'''Powerlaw (power function) PDF with SciPy-style loc/scale.
Canonical form corresponds to loc=0, scale=1.
Notes
-----
For a < 1, the density diverges as x -> loc (from the right).
'''
if a <= 0:
raise ValueError("a must be > 0")
if scale <= 0:
raise ValueError("scale must be > 0")
x = np.asarray(x, dtype=float)
z = (x - loc) / scale
out = np.zeros_like(z, dtype=float)
mask = (z > 0) & (z < 1)
# Compute in log-space for numerical stability
log_pdf = (math.log(a) - math.log(scale)) + (a - 1.0) * np.log(z[mask])
out[mask] = np.exp(log_pdf)
return out
def powerlaw_cdf(x: np.ndarray, a: float, loc: float = 0.0, scale: float = 1.0) -> np.ndarray:
'''Powerlaw CDF with SciPy-style loc/scale.'''
if a <= 0:
raise ValueError("a must be > 0")
if scale <= 0:
raise ValueError("scale must be > 0")
x = np.asarray(x, dtype=float)
z = (x - loc) / scale
out = np.zeros_like(z, dtype=float)
out[z >= 1.0] = 1.0
mask = (z > 0) & (z < 1)
out[mask] = np.power(z[mask], a)
return out
def powerlaw_ppf(u: np.ndarray, a: float, loc: float = 0.0, scale: float = 1.0) -> np.ndarray:
'''Powerlaw quantile function (inverse CDF).'''
if a <= 0:
raise ValueError("a must be > 0")
if scale <= 0:
raise ValueError("scale must be > 0")
u = np.asarray(u, dtype=float)
if np.any((u < 0) | (u > 1)):
raise ValueError("u must be in [0, 1]")
return loc + scale * np.power(u, 1.0 / a)
# Quick sanity check: PDF integrates to ~1 on [0,1]
a0 = 2.5
xgrid = np.linspace(1e-6, 1 - 1e-6, 400_000)
area = np.trapz(powerlaw_pdf(xgrid, a0), xgrid)
area
0.9999975000038249
4) Moments & Properties#
A convenient closed form exists for raw moments. For (k>-a):
4.1 Mean and variance#
Using (k=1) and (k=2):
4.2 Skewness and kurtosis#
Because powerlaw(a) is (\mathrm{Beta}(a,1)), we can reuse standard beta formulas:
4.3 MGF and characteristic function#
The MGF exists for all real (t) (bounded support). A compact special-function expression is
where ({}_1F_1) is the confluent hypergeometric function. The characteristic function is
4.4 Entropy#
The differential entropy (in nats) has a simple closed form:
4.5 Mode (shape intuition)#
If (a>1), the density increases on ((0,1)) and the mode is at (x=1).
If (a=1), the distribution is uniform.
If (0<a<1), the density decreases and diverges at 0 (mode at (x=0)).
def powerlaw_moments(a: float) -> dict:
'''Key moments and summary properties for the canonical powerlaw(a) on [0,1].'''
if a <= 0:
raise ValueError("a must be > 0")
mean = a / (a + 1.0)
var = a / ((a + 1.0) ** 2 * (a + 2.0))
skew = (2.0 * (1.0 - a) * math.sqrt(a + 2.0)) / ((a + 3.0) * math.sqrt(a))
excess_kurt = (6.0 * (((a - 1.0) ** 2) * (a + 2.0) - a * (a + 3.0))) / (a * (a + 3.0) * (a + 4.0))
kurt = excess_kurt + 3.0
entropy = 1.0 - 1.0 / a - math.log(a)
return {
"mean": mean,
"variance": var,
"skewness": skew,
"kurtosis": kurt,
"excess_kurtosis": excess_kurt,
"entropy_nats": entropy,
}
def powerlaw_mgf(t: np.ndarray, a: float) -> np.ndarray:
'''MGF M(t) = E[e^{tX}] using SciPy's confluent hypergeometric 1F1.'''
if a <= 0:
raise ValueError("a must be > 0")
t = np.asarray(t)
return special.hyp1f1(a, a + 1.0, t)
def powerlaw_cf(w: np.ndarray, a: float) -> np.ndarray:
'''Characteristic function phi(w) = E[e^{i w X}].'''
if a <= 0:
raise ValueError("a must be > 0")
w = np.asarray(w)
return special.hyp1f1(a, a + 1.0, 1j * w)
# Spot-check: mean/variance via Monte Carlo and an MGF identity
a0 = 2.5
n = 200_000
u = rng.random(n)
# avoid exact 0 for stability in downstream logs
u = np.clip(u, np.finfo(float).tiny, 1.0)
samples = u ** (1.0 / a0)
mom = powerlaw_moments(a0)
mc_mean = samples.mean()
mc_var = samples.var(ddof=0)
# MGF check: E[e^{tX}] at t=1.2
t = 1.2
mc_mgf = np.mean(np.exp(t * samples))
theory_mgf = float(powerlaw_mgf(t, a0))
mom, (mc_mean, mc_var, mc_mgf, theory_mgf)
({'mean': 0.7142857142857143,
'variance': 0.045351473922902494,
'skewness': -0.7318040653635675,
'kurtosis': 2.7566433566433566,
'excess_kurtosis': -0.24335664335664337,
'entropy_nats': -0.31629073187415513},
(0.7148571686605475,
0.045105947985351946,
2.430906200738553,
2.429630435055177))
5) Parameter Interpretation#
powerlaw has a single shape parameter (a).
5.1 What (a) does#
(0<a<1): mass concentrates near 0; the PDF diverges at 0.
(a=1): uniform on ([0,1]).
(a>1): mass concentrates near 1; the PDF increases toward 1.
5.2 Mean and median as functions of (a)#
Mean: (\mathbb{E}[X]=\frac{a}{a+1}) (increases toward 1 as (a\to\infty))
Median: solve (F(m)=1/2) gives (m = 2^{-1/a})
These simple formulas make it easy to interpret fitted (a) values.
# Shape changes: PDF and CDF for different a
a_values = [0.3, 0.7, 1.0, 2.0, 5.0]
x = np.linspace(1e-4, 1 - 1e-4, 500)
fig_pdf = go.Figure()
fig_cdf = go.Figure()
for a in a_values:
fig_pdf.add_trace(go.Scatter(x=x, y=powerlaw_pdf(x, a), mode="lines", name=f"a={a:g}"))
fig_cdf.add_trace(go.Scatter(x=x, y=powerlaw_cdf(x, a), mode="lines", name=f"a={a:g}"))
fig_pdf.update_layout(
title="Powerlaw PDF (canonical on [0,1])",
xaxis_title="x",
yaxis_title="f(x)",
yaxis_type="log",
)
fig_cdf.update_layout(title="Powerlaw CDF", xaxis_title="x", yaxis_title="F(x)")
fig_pdf.show()
fig_cdf.show()
# Mean as a function of a
agrid = np.linspace(0.2, 8, 300)
mean_grid = agrid / (agrid + 1)
fig_mean = go.Figure()
fig_mean.add_trace(go.Scatter(x=agrid, y=mean_grid, mode="lines"))
fig_mean.update_layout(title="Mean E[X] = a/(a+1)", xaxis_title="a", yaxis_title="E[X]")
fig_mean.show()
6) Derivations#
6.1 Expectation#
Starting from the PDF (f(x\mid a)=a x^{a-1}) on ((0,1)):
More generally, for (k>-a):
6.2 Variance#
Compute (\mathbb{E}[X^2]=\frac{a}{a+2}) and subtract the squared mean:
6.3 Likelihood and MLE#
For i.i.d. data (x_1,\dots,x_n\in(0,1)), the likelihood is
The log-likelihood is
Differentiate and set to zero:
Because (\sum \log x_i < 0) for (x_i\in(0,1)), the MLE (\hat a) is positive.
def powerlaw_loglikelihood(x: np.ndarray, a: float, loc: float = 0.0, scale: float = 1.0) -> float:
'''Log-likelihood for i.i.d. observations under powerlaw(a, loc, scale).'''
if a <= 0:
return -np.inf
if scale <= 0:
return -np.inf
x = np.asarray(x, dtype=float)
z = (x - loc) / scale
if np.any((z <= 0) | (z >= 1)):
return -np.inf
n = z.size
return n * (math.log(a) - math.log(scale)) + (a - 1.0) * float(np.sum(np.log(z)))
def powerlaw_mle_shape(x: np.ndarray, loc: float = 0.0, scale: float = 1.0) -> float:
'''Closed-form MLE for the shape a when loc/scale are known.'''
x = np.asarray(x, dtype=float)
z = (x - loc) / scale
if np.any((z <= 0) | (z >= 1)):
raise ValueError("All observations must satisfy loc < x < loc+scale")
s = float(np.sum(np.log(z))) # negative
n = z.size
return -n / s
# Demonstrate the MLE on simulated data
a_true = 2.5
n = 50_000
x = np.clip(rng.random(n), np.finfo(float).tiny, 1.0) ** (1.0 / a_true)
a_hat = powerlaw_mle_shape(x)
ll_true = powerlaw_loglikelihood(x, a_true)
ll_hat = powerlaw_loglikelihood(x, a_hat)
a_true, a_hat, (ll_true, ll_hat)
(2.5, 2.5199151026621225, (16051.629136719428, 16053.198881460783))
7) Sampling & Simulation#
Because the CDF is (F(x)=x^a), inverse transform sampling is immediate.
7.1 Inverse-transform algorithm (NumPy-only)#
Sample (U\sim\mathrm{Uniform}(0,1))
Return (X = U^{1/a})
This works because:
7.2 Alternative view (log transform)#
If (X\sim\texttt{powerlaw}(a)), then (-\log X\sim\mathrm{Exponential}(\text{rate}=a)). This can be useful for diagnostics and for building hierarchical models.
def powerlaw_rvs_numpy(a: float, size: int, rng: np.random.Generator, loc: float = 0.0, scale: float = 1.0) -> np.ndarray:
'''Sample from powerlaw(a, loc, scale) using NumPy-only inverse CDF.'''
if a <= 0:
raise ValueError("a must be > 0")
if scale <= 0:
raise ValueError("scale must be > 0")
u = rng.random(size)
# u can be exactly 0 with tiny probability (finite-precision RNG);
# clipping avoids -inf if you later take logs of samples.
u = np.clip(u, np.finfo(float).tiny, 1.0)
x = u ** (1.0 / a)
return loc + scale * x
# Monte Carlo validation
a0 = 0.6
n = 200_000
samples = powerlaw_rvs_numpy(a0, n, rng)
mom = powerlaw_moments(a0)
mc = {
"mean": samples.mean(),
"variance": samples.var(ddof=0),
"skewness": stats.skew(samples, bias=True),
"excess_kurtosis": stats.kurtosis(samples, fisher=True, bias=True),
}
mom, mc
({'mean': 0.37499999999999994,
'variance': 0.09014423076923074,
'skewness': 0.4625924443258073,
'kurtosis': 1.9468599033816425,
'excess_kurtosis': -1.0531400966183575,
'entropy_nats': -0.15584104290067602},
{'mean': 0.3743840717431062,
'variance': 0.09018237402830535,
'skewness': 0.4607797772203317,
'excess_kurtosis': -1.0573167628324114})
8) Visualization#
We’ll visualize:
the PDF for multiple values of
athe CDF and an empirical CDF from Monte Carlo samples
a histogram of Monte Carlo samples with the theoretical PDF overlay
# Histogram + theoretical PDF overlay
a = 0.6
n = 50_000
samples = powerlaw_rvs_numpy(a, n, rng)
x_grid = np.linspace(1e-4, 1 - 1e-4, 600)
fig = go.Figure()
fig.add_trace(
go.Histogram(
x=samples,
nbinsx=80,
histnorm="probability density",
name="Monte Carlo",
opacity=0.6,
)
)
fig.add_trace(go.Scatter(x=x_grid, y=powerlaw_pdf(x_grid, a), mode="lines", name="theoretical PDF"))
fig.update_layout(
title=f"Monte Carlo samples vs theoretical PDF (n={n}, a={a:g})",
xaxis_title="x",
yaxis_title="density",
bargap=0.02,
)
fig.show()
# Empirical CDF vs true CDF
a = 2.0
n = 30_000
samples = powerlaw_rvs_numpy(a, n, rng)
xs = np.sort(samples)
ys = np.arange(1, n + 1) / n
x_grid = np.linspace(0, 1, 400)
fig = go.Figure()
fig.add_trace(go.Scatter(x=xs, y=ys, mode="lines", name="empirical CDF"))
fig.add_trace(go.Scatter(x=x_grid, y=powerlaw_cdf(x_grid, a), mode="lines", name="true CDF"))
fig.update_layout(
title=f"Empirical CDF vs true CDF (n={n}, a={a:g})",
xaxis_title="x",
yaxis_title="F(x)",
)
fig.show()
9) SciPy Integration (scipy.stats.powerlaw)#
SciPy provides scipy.stats.powerlaw, which implements the same distribution with an additional location-scale transform.
Canonical form:
stats.powerlaw(a)has support ([0,1]).Location-scale:
stats.powerlaw(a, loc=ℓ, scale=s)has support ([\ell, \ell+s]).
Because powerlaw(a) is (\mathrm{Beta}(a,1)), you can also verify equivalence via scipy.stats.beta(a, 1).
from scipy.stats import beta, powerlaw
# Define a SciPy powerlaw distribution
a = 2.5
rv = powerlaw(a, loc=0.0, scale=1.0)
x_grid = np.linspace(1e-4, 1 - 1e-4, 500)
# SciPy API
pdf_scipy = rv.pdf(x_grid)
cdf_scipy = rv.cdf(x_grid)
# Compare to our implementations
max_pdf_diff = np.max(np.abs(pdf_scipy - powerlaw_pdf(x_grid, a)))
max_cdf_diff = np.max(np.abs(cdf_scipy - powerlaw_cdf(x_grid, a)))
# Sampling
samples_scipy = rv.rvs(size=50_000, random_state=rng)
# Fit (MLE). If you know data are on [0,1], fix loc/scale.
a_hat_fit, loc_hat, scale_hat = powerlaw.fit(samples_scipy, floc=0.0, fscale=1.0)
a_hat_closed = powerlaw_mle_shape(samples_scipy)
# Beta equivalence check
rv_beta = beta(a, 1.0)
max_pdf_diff_beta = np.max(np.abs(rv_beta.pdf(x_grid) - pdf_scipy))
(max_pdf_diff, max_cdf_diff, a_hat_fit, a_hat_closed, max_pdf_diff_beta)
(4.440892098500626e-16,
0.0,
2.495491295398117,
2.495491295398117,
2.6645352591003757e-15)
10) Statistical Use Cases#
10.1 Hypothesis testing#
A common simple test is whether data are uniform on ([0,1]) ((a=1)) versus a powerlaw alternative ((a\neq 1)). You can use a likelihood ratio test (LRT) or a goodness-of-fit test like KS.
10.2 Bayesian modeling#
Because the likelihood in (a) has the form
a Gamma prior on (a) is conjugate.
10.3 Generative modeling#
powerlaw is a handy one-parameter generator on ([0,1]):
generate “best-of-(n)” maxima of uniforms (exactly)
generate random weights/thresholds with controllable concentration near 0 or 1
# Hypothesis test: H0 a=1 (uniform) vs H1 a free (powerlaw)
n = 5_000
a_true = 0.6
x = powerlaw_rvs_numpy(a_true, n, rng)
# Null log-likelihood (a=1)
ll_null = powerlaw_loglikelihood(x, a=1.0)
# MLE under alternative
a_hat = powerlaw_mle_shape(x)
ll_alt = powerlaw_loglikelihood(x, a=a_hat)
lrt = 2.0 * (ll_alt - ll_null)
p_value = stats.chi2.sf(lrt, df=1)
# KS test against fitted distribution
D, p_ks = stats.kstest(x, powerlaw(a_hat).cdf)
{"a_true": a_true, "a_hat": a_hat, "LRT": lrt, "p_value_chi2": p_value, "KS_D": D, "KS_p": p_ks}
{'a_true': 0.6,
'a_hat': 0.5879063580561444,
'LRT': 1697.6355832402796,
'p_value_chi2': 0.0,
'KS_D': 0.009659281640087447,
'KS_p': 0.735458175746772}
# Bayesian update for a with a conjugate Gamma prior
# Prior: a ~ Gamma(alpha0, rate=beta0)
alpha0 = 2.0
beta0 = 1.0 # rate
n = 2_000
a_true = 2.5
x = powerlaw_rvs_numpy(a_true, n, rng)
S = float(np.sum(np.log(x))) # negative
# Posterior: a | x ~ Gamma(alpha0 + n, rate=beta0 - S)
alpha_post = alpha0 + n
beta_post = beta0 - S
post = stats.gamma(a=alpha_post, scale=1.0 / beta_post) # SciPy gamma uses scale = 1/rate
post_mean = post.mean()
ci_95 = post.ppf([0.025, 0.975])
# Plot posterior density
a_grid = np.linspace(post.ppf(0.001), post.ppf(0.999), 400)
fig = go.Figure()
fig.add_trace(go.Scatter(x=a_grid, y=post.pdf(a_grid), mode="lines", name="posterior"))
fig.add_vline(x=a_true, line_dash="dash", line_color="black", annotation_text="true a")
fig.update_layout(
title="Posterior for a with Gamma prior (conjugate)",
xaxis_title="a",
yaxis_title="density",
)
fig.show()
{"a_true": a_true, "posterior_mean": post_mean, "ci_95": ci_95}
{'a_true': 2.5,
'posterior_mean': 2.399059202286664,
'ci_95': array([2.2951, 2.5053])}
# Generative modeling pattern: max of n uniforms equals powerlaw(a=n)
n = 5
m = 80_000
# Max of n uniforms
u = rng.random((m, n))
max_u = u.max(axis=1)
# Equivalent powerlaw with a=n
samples_pw = powerlaw_rvs_numpy(a=float(n), size=m, rng=rng)
# Compare empirical CDFs
xs1 = np.sort(max_u)
ys1 = np.arange(1, m + 1) / m
xs2 = np.sort(samples_pw)
ys2 = np.arange(1, m + 1) / m
fig = go.Figure()
fig.add_trace(go.Scatter(x=xs1, y=ys1, mode="lines", name="max of n uniforms"))
fig.add_trace(go.Scatter(x=xs2, y=ys2, mode="lines", name="powerlaw(a=n)"))
fig.update_layout(
title=f"Max of n={n} uniforms vs powerlaw(a={n})",
xaxis_title="x",
yaxis_title="empirical CDF",
)
fig.show()
# Two-sample KS test (should not reject for large m)
ks = stats.ks_2samp(max_u, samples_pw)
ks
KstestResult(statistic=0.005637499999999962, pvalue=0.15667073167578494, statistic_location=0.9452325925703184, statistic_sign=1)
11) Pitfalls#
Name confusion: SciPy’s
powerlawis bounded on ([0,1]); heavy-tailed “power-law” behavior is better modeled by distributions like Pareto/Zipf.Parameter constraints: (a>0),
scale>0. Invalid values should raise errors.Boundary values: the canonical density is defined on ((0,1)); exact 0 or 1 values make (\log x) problematic for likelihood-based methods.
In practice: clip values (with care) or model measurement noise.
Divergence at 0 for (a<1): the PDF goes to (+\infty) as (x\to 0^+). For plots/integration, start at a small (\varepsilon) like (10^{-6}).
Fitting with free
loc/scale: if your data are already in ([0,1]), fixloc=0,scale=1for stable estimation of (a).
12) Summary#
powerlaw(a)is a continuous distribution on ([0,1]) with CDF (F(x)=x^a) and PDF (f(x)=a x^{a-1}).It is exactly (\mathrm{Beta}(a,1)), and for integer (a=n) it matches the distribution of (\max{U_1,\dots,U_n}) for uniforms.
Moments are simple: (\mathbb{E}[X]=\frac{a}{a+1}), (\mathrm{Var}(X)=\frac{a}{(a+1)^2(a+2)}), and (\mathbb{E}[X^k]=\frac{a}{a+k}).
Sampling is easy with inverse CDF: (X=U^{1/a}).
SciPy integration is straightforward with
scipy.stats.powerlaw, usingloc/scalefor interval transforms.